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BACKGROUND OF THE INVENTION 
[0012] 1. Field of the Invention 

[0013] This invention pertains generally to data compression methods and 

systems, and more particularly to an efficient scalable predictive coding 
method and system where most or all of the information available to the 
enhancement-layer is exploited to improve the quality of the prediction. 

[0014] 2. Description of the Background Art 

[0015] Many applications require data, such as video, to be simultaneously 

decodable at a variety of rates. Examples include applications involving 
broadcast over differing channels, multicast in a complex network where the 
channels/links dictate the feasible bit rate for each user, the co-existence of 
receivers of different complexity (and cost), and time-varying channels. An 
associated compression technique is "scalable" if it offers a variety of 
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decoding rates using the same basic algorithm, and where the lower rate 
information streams are embedded in the higher rate bit-streams in a manner 
that minimizes redundancy. 

[0016] A predictive coding system for encoding and decoding a signal without 
scalability is well-known in the literature of signal compression. (See for 
example: predictive vector quantization [6], and motion-compensated 
predictive transform coding of video [3]). In such predictive coding systems 
the encoder includes a decoder and memory so that what is actually encoded 
is the difference between the input signal and a predicted version of the 
reproduced signal, this difference signal being called the residual. The 
decoder contains a prediction loop whereby the current residual frame is 
decoded and then it is added to a prediction of the current frame obtained 
from the previous reproduced frame. In some cases, the predictor uses 
several prior frames to predict the current frame. 

[0017] A major difficulty encountered in scalable predictive coding is how to 

take advantage of the additional information, available to the enhancement- 
layer decoder for improved prediction, without causing undesired conflicts with 
the information obtained from the base layer. FIG. 1 depicts a two-layer 
scalable coding system 10 where it is assumed that the original input signal 
(e.g., an audio or video signal) is segmented into frames that are sequentially 
encoded. Typical examples are video frames, and speech frames, but 
"frame" here will also cover the degenerate case of a single sample as in 
differential pulse coded modulation (DPCM). The term "frame" as used herein 
refers either to a group of contiguous samples of an original input signal or a 
set of parameters extracted from the original group of samples (such as a set 
of transform coefficients obtained by a discrete-cosine transform (DCT) 
operation on the original group of samples) and in each case the terminology 
"frame" or "signal" will be used to refer to this entity that is representative of 
the original group of samples or is itself the original group of samples. 

[0018] The input frame 12, x{n) , is compressed by the base encoder (BE) 14 

which produces the base bit-stream 16. The enhancement-layer encoder 
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(EE) 18 has access to the input frame 12 and to any information produced by 
or available to BE 14. EE 18 uses this data to generate the enhancement- 
layer bit-stream 20. A base decoder (BD) 22 receives the base bit-stream 16 
and produces a reconstruction 24, x b (n) , while the enhancement-layer 
decoder (ED) 26 has access to both bit-streams and produces an enhanced 
reconstruction 28, x e (n) . The reconstruction frames that are available at the 

decoder are used to predict or estimate the current frame. Note that ED 26 
has access to both bit streams and hence it effectively has access to both the 
reconstruction frame at the base layer, x b (n) , and the previous reconstructed 
frame at the enhancement layer x e (n-l) , while BD 22 has only access to the 
previous reconstructed frame at the base layer, x b (n-l) , which is stored in 
the memory within BD. In the case of a scalable coding system with multiple 
enhancement layers, an enhancement layer decoder may have access to the 
reconstruction frames from lower enhancement layers as well as from the 
base layer. The prediction loop (internal to the operation of BD as in any 
predictive coding system but not shown in the figure) in this configuration 
causes severe difficulties in the design of scalable coding. Accordingly, a 
number of approaches to scalable coding have been developed. These 
include, 

[0019] (1) The standard approach: At the base layer, BE 14 compresses the 

residual 

r b (n) = x(n)-P[x b (n-l)], 
where P denotes the predictor (e.g., motion compensator in the case of video 
coding). Note that for notational simplicity we assume first-order prediction, 
but in general several previous frames may be used. BD 22 produces the 
reconstruction 

x b (n) = P[x b (n-\)] + r b (n), 
where r b {n) is the compressed-reconstructed residual. At the enhancement- 
layer, EE 18 compresses the base layer's reconstruction error 

r o> = x{n) _ % b {n) = x{n) _ P[ £ t {n _ 1}] _ fb (n) m 
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The enhancement-layer reconstruction is 

x m (») = x b («) + r e (1) («) - (n - 1)] + (») + r e 0) (n) . 
See, e.g., [1]. A deficiency of this approach is that no advantage is taken of 
the potentially superior prediction due to the availability of x e (n-\) at the ED 
26. 

[0020] (2) The separate coding approach: BE 14 compresses r b (ri) as 

above, but EE 18 compresses the "enhancement-only" prediction error 

r™=x(n)-P[x e {n-m 
directly. The enhancement-layer reconstruction is 

x e (n) = P[x e (n-l)] + f e (2 \n). 
A deficiency of this approach is that, while the approach takes advantage of 
information available only to the enhancement-layer, it does not exploit the 
knowledge of r b (ri) which is also available at the enhancement-layer. The two 
layers are, in fact, separately encoded except for savings on overhead 
information which needs not be repeated (such as motion vectors in video 
coding) [2]. 

[0021] (3) Layer-specific prediction at the decoder approach: BD 22 

reconstructs the frame as 

x b (n) = P[x b (n-l)] + f b (n), 
and ED 26 reconstructs as 

x e (n) = P[x e (n - 1)] + P b (n) + r e (n) . 
However, the encoders BE 14 and EE 18 use the same prediction [3], and the 
options are: 

[0022] (a) Both encoders use base-layer prediction P[x b (n - 1)] . This results 

in drift of the enhancement-layer decoder. (The term "drift" refers to a form of 
mismatch where the decoder uses a different prediction than the one 
assumed by the encoder. This mismatch tends to grow as the "corrections" 
provided by the encoder are misguiding, hence, the decoder "drifts away"). 
[0023] (b) Both encoders use enhancement-layer prediction P[x e (n-l)] . This 

results in drift of the base-layer decoder. 
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[0024] (4) Switch between approaches (1) and (2) on a per frame or per block 

basis [4], or per sample [5]. This approach has the deficiencies of either 
approach (1 ) or (2) as described above, at each time depending on the 
switching decision. 

BRIEF SUMMARY OF THE INVENTION 

[0025] The present invention addresses the prediction loop deficiencies in 
conventional scalable coding methods and systems in a way that achieves 
efficient scalability of predictive coding. The approach is generally applicable 
and may, in particular, be applied to standard video and audio compression. 
In the present invention, most or all of the information available at an 
enhancement-layer may be exploited to improve the quality of the prediction. 

[0026] By way of example, and not of limitation, in the present invention the 

current frame is predicted at the enhancement-layer by processing and 
combining the reconstructed signal representing: (i) the current base-layer (or 
lower layers) frame; and (ii) the previous enhancement-layer frame. The 
combining rule takes into account the compressed prediction error of the 
base-layer, and the parameters used for its compression. The main difficulty 
overcome by this invention is in the apparent conflicts between these two 
sources of information and their impact as described in the Background of the 
Invention. This difficulty may explain why existing known methods 
exclusively use one of these information sources at any given time. These 
methods will be generally referred to here as switching techniques (which 
include as a special case the exclusive use of one of the information sources 
at all times). Additionally, the invention optionally includes a special 
enhancement-layer synchronization mode for the case where the 
communication rate for a given receiver is time varying (e.g., in mobile 
communications). This mode may be applied periodically to allow the receiver 
to upgrade to enhancement-layer performance even though it does not have 
prior enhancement-layer reconstructed frames. 

[0027] An object of the invention is to achieve efficient scalability of predictive 

coding. 
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[0028] Another object of the invention is to provide a method and system for 

scalable predictive coding that is applicable to typical or standard video and 
audio compression. 

[0029] Another object of the invention is to provide a scalable predictive 

coding method and system in which all or most of the information available at 
an enhancement-layer is exploited to improve the quality of the prediction. 

[0030] Further objects and advantages of the invention will be brought out in 

the following portions of the specification, wherein the detailed description is 
for the purpose of fully disclosing preferred embodiments of the invention 
without placing limitations thereon. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0031] The invention will be more fully understood by reference to the 
following drawings which are for illustrative purposes only: 

[0032] FIG. 1 is functional block diagram of a conventional two-layer scalable 

predicting coding system. 

[0033] FIG. 2 is a functional block diagram of an enhancement layer encoder 

of a scalable predictive coding system in accordance with the present 
invention. 

[0034] FIG. 3 is a functional block diagram of a base layer reconstruction 

module according to the present invention. 
[0035] FIG. 4 is a functional block diagram of an enhancement layer 

reconstruction module according to the present invention. 
[0036] FIG. 5 is a functional block diagram of a three-layer scalable encoder 

employing the enhancement encoder of the present invention. 
[0037] FIG. 6 is a functional block diagram of a three-layer scalable decoder 

corresponding to the encoder shown in FIG. 5. 
[0038] FIG. 7 is a functional block diagram of a two-layer scalable video 

encoder employing the enhancement encoder of the present invention. 
[0039] FIG. 8 is a functional block diagram of a two-layer decoder 

corresponding to the encoder shown in FIG. 7. 
[0040] FIG. 9 is a functional block diagram of the spatial motion compensator 



ROS5313.01A2 



7 



EV352303566US 



blocks shown in FIG. 7 and FIG. 8. 

DETAILED DESCRIPTION OF THE INVENTION 
[0041] Referring more specifically to the drawings, where like reference 

numbers, labels and symbols denote like parts, for illustrative purposes the 
present invention will be described with reference to the encoder generally 
shown in FIG. 2, as well as the encoding system shown in FIG. 2 through FIG. 
6, and the scalable predictive coding method described in connection 
therewith. Various embodiments of encoders and decoders employing the 
present invention, and details therefore, are shown and described in FIG. 7 
through FIG. 9. 

[0042] The method of the present invention generally comprises upgrading the 

prediction used at each enhancement-layer by combining, with minimal 
conflict, the information provided from both sources, namely, information 
available at, and used by, the base-layer (or lower layers), and information 
that is available only at the enhancement-layer. In the case of a scalable 
predictive coding system with multiple enhancement layers, the prediction at 
an enhancement layer may combine information provided from all lower 
enhancement layers as well. The invention provides for prediction or 
estimation of the signal frame itself in any representation, or any subset of 
signal representation coefficients such as transform coefficients (e.g., in 
video, audio), line spectral frequencies (e.g., in speech or audio), etc. The 
term "frame" and the corresponding mathematical notation will be used 
generally to refer to the relevant set of frame coefficients being estimated or 
predicted by the method in each particular application. 

[0043] Referring first to FIG. 2, a functional block diagram of an enhancement 

layer encoder of a scalable predictive coding system in accordance with the 
present invention is shown. In the enhancement layer encoder 100 of the 
present invention, an enhancement layer estimator (ELE) 102 computes a 
new predicted frame 104, x e (ri) , by combining information from the 
reconstruction frame 106 at the base layer, x b (n) and from the previous 
reconstructed frame 108 at the enhancement layer x e (n - 1) . Note that first 
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order prediction is described for notational simplicity but several previous 
frames may be used. The combining rule depends on any or all of, but not 
limited to, the following parameters: the compression parameters 110 of the 
base layer (such as quantization step and threshold, and the quantized base- 
layer residual 112, f b {n) , (see FIG. 3)), and the statistical parameters 1 14 of 
the time evolution of the frames (such as inter-frame correlation coefficients 
and variance). The statistical parameters may be either estimated off-line 
from training data, or estimated on-line by an adaptive estimator which tracks 
variation in the signal statistics based on either the original signal (in which 
case the parameters need to be transmitted to the decoder) or based on 
reconstructed signals which are available to the receiver. The exact definition 
of the combination rule depends on the level of complexity allowed for the 
module. At the high end, one may compute a possibly complex, optimal 
predicted frame given all the available information. The enhancement layer 
residual 116, r e (n) , which is the difference between the input frame 1 1 8, x(n) , 
and the predicted frame 104, x e (n) , is then compressed by a compressor 120 
to produce the enhancement bits 122. 
[0044] Referring to FIG. 3 through FIG. 6, a complete scalable predictive 

coding system for use with this invention is shown. While only three layers 
are shown, it will be appreciated that additional layers can be added and are 
contemplated within the scope of the invention. FIG. 3 shows a base layer 
reconstruction module 124 which receives the quantized base layer residual 
112, P b (n) , and adds it to the base predicted frame 126, x b (n) , to produce the 
base layer reconstruction frame 106, x b (n) . A delay 128 produces a delayed 
base reconstructed frame 130, x b {n-\) , which is input to the base predictor 
132 which computes the base predicted frame 126, x b (ri) , which is needed to 

produce the reconstructed frame as explained above. 
[0045] The enhancement layer reconstruction module 134 shown in FIG. 4 

receives the quantized enhancement layer residual 136, ? e (n), and adds it to 
the enhancement layer predicted frame 104, x e {ri) , to produce the 
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enhancement layer reconstruction frame 138, x e (n) . A delay 140 produces a 
delayed enhancement layer reconstructed frame 108, x e (n-l) , which is input 
to the enhancement layer estimator 102, which in turn computes the 
enhancement layer predicted frame 104, x e (n) , as explained with reference to 
FIG. 2. 

[0046] FIG. 5 shows how the modules described in FIG. 2 through FIG. 4 may 

be combined to obtain a complete scalable predictive encoder. Only three 
layers are shown without implying any limitation, as extensions to further 
layers is obvious and straightforward. Most inputs and outputs were explained 
in the context of the previous figures, and to distinguish between the notation 
for the first and second enhancement layer signals, the prefix EL1 or EL2 was 
added, respectively. 

[0047] The signal frame to be compressed (which may be the original raw 
signal, or any set of coefficients extracted from it for the purpose of 
compression) denoted x(n) is fed to all layers in parallel. In each layer the 
predicted frame (x b (n) in the base layer, (EL1)x e (n) in the first enhancement 
layer, and (EL2) x e (n) at the second enhancement layer) is subtracted from 
x(n) to obtain the prediction error (or residual) at the layer (r b (n) , (EL1 ) r e {n) , 
and (EL2) r e (n), for the base, first enhancement and second enhancement 
layers, respectively). The residual is compressed by the layer's 
Compressor/Quantizer which outputs: the layer's bits for transmission to the 
decoder, the reconstructed (quantized) residual (? b (n) , (EL1) f e {n) , and (EL2) 
r e (n), for the base, first enhancement and second enhancement layers, 
respectively), as input to the layer's reconstruction module, and the set of 
compression parameters for use by a higher layer. Note that the 
enhancement layer compressor/quantizer subsumes the compressor 120 of 
FIG. 2 as, beside the bit stream, it also outputs the quantized residual. The 
reconstruction module of each layer processes its input signals as per Figures 
3 and 4, and outputs the reconstructed frame for the layer (x b (n) , (EL1 ) x e (ri) , 
and (EL2) x e (n) , for the base, first enhancement and second enhancement 
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layers, respectively), and the layer's predicted frame (x b (n) , (EL1) x e {n) , and 
(EL2) x e (n) , for the base, first enhancement and second enhancement layers, 
respectively). 

[0048] The corresponding three layer scalable predictive decoder is shown in 
FIG. 6. Each layer's inverse compressor/quantizer receives as input the 
layer's bit stream from which it reproduces the layer's quantized residual. It 
also extracts the layer's compression parameters for use by a higher layer 
reconstruction module. The rest of the diagram is identical to the encoder of 
FIG. 2 and similarly produces the reconstructed frame at each layer. 

[0049] It will be appreciated that the invention is generally applicable to 

predictive coding and, in particular, may be applied to known vector 
quantizer-based compression techniques, and known fransform-based 
techniques. Further, it is applicable to compression of speech, audio, and 
video signals. A combining rule employing optimal estimation for scalable 
compression is described next as an implementation example of the 
invention. 

[0050] In typical predictive coding, a number of signal representation 

coefficients (e.g., vectors of transform coefficients, line spectral frequencies, 
or vectors of raw signal samples) are extracted per frame and quantized 
independently. A specific low complexity implementation of the invention 
consists of optimally combining the information available for predicting the 
coefficient at an enhancement-layer. The reconstructed coefficient at the 
base-layer, x b {n) , and the quantization interval (or partition region in the case 
of vector quantization) of the corresponding reconstructed residual P b (n) , 
determine an interval/cell I{n) within which the original coefficient x(n) must 
lie. From the corresponding reconstructed coefficient at the previous 
enhancement-layer frame, x e (n-\) , and a statistical model on time evolution 
of the coefficients, one may construct a probability density function for x(n) 
conditional on x e (n-l) , denoted byp[x(n)\x e (n-l)] . The optimal estimate of 
x{ri) is obtained by expectation: 
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J xp[x(n)\x e (n - Y)]dx 

X e\ n ) ~ — f ~~ ' 

)p[x(n)\x e (n-l)]dx 
/(«) 

This predictor incorporates the information provided by the base-layer 
(interval within which x(n) lies), and by the enhancement-layer (probability 
distribution of x{n) conditional on x e (n~l)). 
[0051] Referring now to FIG. 7 and FIG. 8, a system for scalable predictive 

transform coding which is designed for the compression of video signals is 
shown. In current practice and standards (e.g., [4]), the system uses motion 
compensation for basic frame prediction, applies the discrete cosine transform 
(DCT) to the prediction error (residual), and quantizes the transform 
coefficients one at a time. A block diagram of a two-layer scalable video 
encoder is shown in FIG. 7, and the corresponding decoder is shown in FIG. 
8. FIG. 9 shows a functional block diagram corresponding to the spatial 
motion compensator blocks shown in the base layer and the enhancement 
later. 

[0052] Note that, for simplicity, the symbols x,r,x,r,x for the video and 

residual signals at the base and enhancement layers in the diagram are in the 
transform domain, even though motion compensation is performed in the 
spatial domain (FIG. 9). Note further that additional enhancement layers may 
be added where an enhancement layer k builds on and relates to layer k-1 
below it exactly as shown for the first two enhancement layers. 

[0053] The first-order Laplace-Markov process was chosen for modeling the 

time evolution statistics of the video signal: 

x(n) = p MC[x{n - 1)] + z(ri) , 
where x(n) is the DCT coefficient in the current frame and MC[x(n^)] is the 
corresponding (after motion compensation) coefficient in the previous frame. 
The correlation coefficient p is assumed to be nearly one. As x(n) has a 
Laplacian density, the driving process, z(7?J, is zero-mean, white, stationary, 
and has the density 
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p(z) = p 2 S(z) + (l-p 2 )^e- a ". 

(Both a and p may in practice be estimated "offline" from training data, or via 
an adaptive estimator that tracks variations in local statistics of the signal). 
The base layer performs standard video compression: its predictor consists 
only of motion compensation, x b (n) = MC[x b (n - 1)] , the residual 

r b (n) = x{n)-x b {n) is quantized and the corresponding index is transmitted. 
Let [a,b] be the quantization interval, hence r b {ri) e[a y b] . Thus the information 
the base layer provides on x(n) is captured in the statement: 

e [x b (n) + a,x b (n) + b]. 
[0054] At the enhancement layer, the prediction exploits the information 

available from both layers. The optimal predictor is given therefore by the 
expectation: 

x e in) = E {x(n)\ x e (n - 1), x{n) e [x b (n) + a,x b (n) + b]} , 
which is conveniently rewritten as 

= x e {n - 1) + E{z(n)\z(n) e /»} 

where 

x e (n-\) = MC[x e (n-\)] 
and the expectation interval is 

h (") = [x b (n) + a- x e (n - l),x b (n)+b- x e (n - 1)] . 
[0055] This prediction is directly implemented using the model for p(z) given 

above: 

jzp(z)dz 

x e (n) = x e (n-l)+ ! >f . 

J p(f)dz 

I An) 

[0056] The integral may be analytically evaluated and its closed form solution 
given explicitly in terms of the integral limits and the parameters a,/?, is 
normally used for simple implementation. 

[0057] This embodiment of the invention is of low complexity, uses standard 
video compression for its base layer, and provides substantial performance 



ROS5313.01A2 



13 



EV352303566US 



gains which build up and increase with the number of layers implemented. Its 
absence in all leading standards in spite of its gains and low complexity 
strongly suggests that the invention is not obvious to the leading researchers 
and developers in the field of video compression. 

[0058] The scalable predictive coding method of the invention, although 
illustrated herein on a two or three-layer scalable system, is repeatedly 
applicable to further layers of enhancement in a straightforward manner. For 
example, at layer /ewe combine signal information from the current 
reconstructed frame at layer k-1, and from the previous reconstruction frame 
at layer k. A higher complexity version allows for the combining rule to take 
into account data from all lower layers. In the special implementation 
described, information from all lower layers contributes to restricting the final 
interval within which the coefficient must lie. Another higher complexity 
version uses higher order prediction (based on multiple past frames). 

[0059] Another application of the invention pertains to time-varying channels, 

such as mobile communications, and most common network communications. 
When the receiver experiences an improvement in channel conditions, it 
attempts to decode higher enhancement bits and improve the quality of the 
reconstruction. However, it can not compute the enhancement layer prediction 
as past enhancement layer reconstruction frames were not decoded and are 
not available. The present invention includes a solution to this problem, which 
comprises periodically (e.g., once per fixed number of frames) constraining 
the enhancement encoder to exclusively use lower layer information for the 
prediction. This periodic constrained prediction synchronizes the 
enhancement decoder with the enhancement encoder and allows the receiver 
to decode the enhancement-layer signals. The frequency of application of 
this constrained mode may be different for each layer and may be optimized 
for the time-varying channel statistics. The trade off is between some 
temporary degradation in prediction (when the prediction is constrained) and 
the receiver's capability to upgrade to enhancement layer performance as the 
channel conditions improve. 
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[0060] Finally, it will be appreciated that the scalability advantages of the 

invention may be easily combined with known methods for temporal and 
spatial scalability. 

[0061] Accordingly, it will be seen that this invention provides for efficient 

scalability of predictive coding that is applicable to standard video and audio 
compression. The invention uses most or all of the information available at an 
enhancement-layer to improve the quality of the prediction. In addition, the 
invention provides for enhancement-layer synchronization to accommodate 
situations where the communication rate for a given receiver is time varying 
(e.g., in mobile communications). Although the description above contains 
many specificities, these should not be construed as limiting the scope of the 
invention but as merely providing illustrations of some of the presently 
preferred embodiments of this invention. Thus the scope of this invention 
should be determined by the appended claims and their legal equivalents. 



ROS5313.01A2 



15 



EV352303566US 



